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Abstract — End-user privacy in smart meter measurements 
is a well-known challenge in the smart grid. The solutions 
offered thus far have been tied to specific technologies such 
as batteries or assumptions on data usage. Existing solutions 
have also not quantified the loss of benefit (utility) that results 
from any such privacy-preserving approach. Using tools from 
information theory, a new framework is presented that abstracts 
both the privacy and the utility requirements of smart meter 
data. This leads to a novel privacy-utility tradeoff problem 
with minimal assumptions that is tractable. Specifically for a 
stationary Gaussian Markov model of the electricity load, it is 
shown that the optimal utility-and-privacy preserving solution 
requires filtering out frequency components that are low in power, 
and this approach appears to encompass most of the proposed 
privacy approaches. 

I. Introduction 

Information collection and dissemination, some of it using 
smart meters, are critical to the smart grid. But information 
about electricity consumption that is collected and harnessed 
for a more efficient and multi-faceted grid may be used for 
purposes beyond electricity consumption, thereby making it 
potentially dangerous to individual privacy. The privacy con- 
sequences of smart grid development are hard to understand 
for two principal reasons: (i) the full range of technological 
capabilities and information extraction possibilities have not 
been laid out, and (ii) our concept of privacy in this space 
are yet poorly defined and shifting. Smart meters are an 
indispensable enabler in the context of smart grids, which 
deploy advanced information and communication technology 
to control the electrical grid. 

The main motivations for high-resolution energy usage data 
collection are to forecast load demand and to provide opti- 
mized service to consumers in the form of pricing structure [1]. 
An electricity provider can use this information to facilitate 
more efficient network management, peak load reduction, load 
shaping, and a number of other such uses. However, it has been 
known for some time that the information of appliance use can 
be reconstructed from the overall real-time load using libraries 
of appliance load signatures that could be matched to signals 
found within the noise of a customer's aggregated electricity 
use and a large amount of detail concerning customer usage 
habits can be discerned [2]. [1] cites a list of privacy-sensitive 
characteristics that may be inferred from electricity load data 
ranging from house occupancy to personal habits and routines. 
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The NIST Smart Grid Interoperability Panel has also un- 
derlined risk to privacy of personal behavior because new 
types of energy use data are created and communicated by 
smart meters, such as unique electric signatures for consumer 
electronics and appliances, thereby opening up further oppor- 
tunities for general invasion of privacy. [3] suggest that there 
will always "be the temptation to sell such information such as 
energy usage or appliance data, either in identifiable customer 
level, anonymized or aggregate form to third parties such as 
marketers seeking commercial gain.". Thus, a desired feature 
of privacy design in the smart grid would be "positive-sum, 
not zero-sum" in that it seeks to accommodate all legitimate 
interests and objectives in a fair manner without completely 
sacrificing privacy for utility or vice-versa. 

A typical approach to privacy in smart meter data is 
aggregation along dimensions of space (using neighborhood 
gateways, e.g. [4]), time (using battery storage, e.g. [5]), or 
precision (by noise addition, e.g. [6]). These solutions seek 
to support utility and privacy in different ways; however, 
they do not have a robust theoretical basis for both privacy 
and utility. Such a basis is important for several reasons. 
First, a theoretical abstraction allows us to recast the problem 
in a technology-independent manner - we need a privacy 
framework that not only addresses the capabilities of current 
non-intrusive load monitoring (NALM) techniques but is also 
extensible to future ones. Second, a theoretical framework 
enables us to examine the costs of lost privacy against the 
benefits of data dissemination, namely, the tradeoff between 
privacy and utility. It would be desirable to give each customer 
the ability to decide that tradeoff and also to give the electricity 
provider the ability to incentivize the customer to participate 
in such a bargain by offering interesting points of tradeoff. 
Finally, a theoretical framework for privacy and utility may 
expose points of tradeoff that are unexpected. 

We propose a general theoretical framework that brings 
most current treatments of the privacy-utility tradeoff into a 
single model - it enables us to look at a spectrum of abstract 
privacy-utility choices and enables us to find maximal points 
on such a tradeoff curve. It also suggests new possible ways 
of achieving this tradeoff that have not been considered thus 
far. 

What we have found is that suppressing low power com- 
ponents would be consistent with intuitive notions of privacy 
in smart meter data. At the same time, our utility constraints 
guarantee that the bulk of the energy consumption information 
in the load measurements is retained in the revealed data. This 



suggests that it may indeed be possible to reveal significant 
energy consumption information without also revealing a lot of 
personal information and the resulting tradeoff can be tuned. 
This would be an interesting avenue for further exploration. 
The paper is organized as follows. In Section [Til we outline 
current approaches to smart meter privacy. In Section Hill we 
develop our model, metrics, and the privacy-utility tradeoff 
framework and illustrate our results in Section [IV] 

II. Related Work 

The advantages and usefulness of smart meters in general is 
examined in a number of papers; see for example [7] and the 
references therein. [5] presents a pioneering view of privacy 
of smart meter information: the authors identify the need 
for privacy in a home's load signature as being an inference 
violation (resulting from load signatures of home appliances) 
rather than an identity violation (i.e. loss of anonymity). 
Accordingly, they propose home electrical power routing using 
rechargeable batteries and alternate power sources to moderate 
the effects of load signatures. They also propose three different 
privacy metrics: relative entropy, clustering classification, and 
a correlation/regression metric. However they do not propose 
any formal utility metrics to quantify the utility-privacy trade- 
off. 

Recently, [8] proposes additional protection through the 
use of a trusted escrow service, along with randomized time 
intervals between the setup of attributable and anonymous data 
profiles at the smart meter. [9] shows, somewhat surprisingly, 
that even without a priori knowledge of household activities or 
prior training it is possible to extract complex usage patterns 
from smart meter data such as residential occupancy and 
social activities very accurately using off-the-shelf statistical 
methods. [4] and [9] propose privacy-enhancing designs using 
neighborhood-level aggregation and cryptographic protocols to 
communicate with the energy supplier without compromising 
the privacy of individual homes. However, escrow services and 
neighborhood gateways support only restricted query types and 
do not completely solve the problem of trustworthiness. [10] 
presents a formal state transition diagram-based analysis of the 
privacy afforded by the rechargeable battery model proposed 
in [5]. However, [10] does not offer a comparable model of 
utility to compare the risks of information leakage with the 
benefits of the information transmitted. 

In, [6] the authors present a method of providing differential 
privacy over aggregate queries modeling smart meter measure- 
ments as time-series data from multiple sources containing 
temporal correlations. While their approach has some similar- 
ity to ours in terms of time-series data treatment, their method 
does not seem generalizable to arbitrary query types. On the 
other hand, [11] introduces the notion of partial information 
hiding by introducing uncertainty about individual values in a 
time series by perturbing them. Our method is a more general 
approach to time series data perturbation that guarantees that 
the perturbation cannot be eliminated by averaging. 



III. Our Contributions 

The primary challenge in characterizing the privacy-utility 
tradeoffs for smart meter data is creating the right abstraction 
- we need a principled approach that provides quantitative 
measures of both the amount of information leaked as well as 
the utility retained, does not rely on any assumptions of data 
mining algorithms, and provides a basis for a negotiated level 
of benefit for both consumer and supplier [3]. [10] provides 
the beginnings of such a model - they assume that in every 
sampling time instant, the net load is either or 1 power unit 
represented by the smart meter readings X}., k — 1,2,..., are 
a discrete-time sequence of binary independent and identically 
distributed values. They model the battery-based filter of [5] 
as a stochastic transfer function that outputs a binary sequence 
Xk that tells the electricity provider whether the home is 
drawing power or not at any given moment. The amount 
of information leaked by the transfer function is defined to 
be the mutual information rate I(X:X) between the random 
variables X and X. By modeling the battery charging policy 
as a 2-state stochastic transition machine, they show that there 
exist battery policies that result in less information leakage 
than from the deterministic charging policy of [5]. Though 
[10] does not provide a general utility function to go with 
the chosen privacy function and the modeling assumptions are 
extremely simplistic, it nevertheless provides a good starting 
point for our framework. 

In our model, we assume that the load measurements are 
sampled (at an appropriate frequency) from a smart meter, 
that they are real-valued, and can be correlated (models 
the temporal memory of both appliances and human usage 
patterns). Rather than assume any specific transfer function, 
we assume an abstract transfer function which maps the input 
load measurements X into an output sequence X. As in [10], 
we assume a mutual information rate as a metric for privacy 
leakage; however, we allow for the fact that a large space of 
(unknown to us) inferences can be made from the meter data - 
we model the inferred data as a random variable Y correlated 
with the measurement variable X. Thus, the privacy leakage 
is the mutual information between Y and X. We also provide 
an abstract utility function which measures the fidelity of the 
output sequence X by limiting the Euclidean distance (mean 
square error) between X and X. Using these abstractions 
and tools from the theory of rate distortion we are able to 
meet all our requirements for a general but tractable privacy- 
utility framework: the privacy and utility requirements provide 
opposing constraints that expose a spectrum of choices for 
trading off privacy for utility and vice-versa. 

A. Model 

We write Xt, t = 1,2, ... ,n, to denote the sampled load 
measurements from a smart meter. In general, xt are complex 
valued corresponding to the real and reactive measurements 
and are typically vectors for multi-phase systems [2]. For 
simplicity and ease of presentation, we model the meter mea- 
surements as a sequence of real-valued scalars (for example, 



such a model applies to two-phase 120 V appliances for which 
one of the two phase components is zero). 

For appropriately small sampling intervals, the smart me- 
ter time-series data that result from sampling the underly- 
ing continuous-time continuous-amplitude processes can be 
viewed as being generated by a random source with memory. 
The memory models the continuity and the effect of both 
short-term and long-term correlations in the load measure- 
ments. The short term correlations typically model the effect 
of the set of appliances in use over the said duration while 
the long term correlations model the long term power usage 
pattern of the human user. We model the continuous valued 
smart meter data as a sequence . . . , X k -%,X k , Xk+i, • • •, of 
random variables X k £ X, — oo < fc < oo, generated by a 
stationary continuous valued source with memory. Specifically, 
we model the continuous valued discrete-time smart meter data 
as a sequence . . . , X k _x,X k , Xk+i> • - •> of Gaussian random 
variables X k G X, k = 0, ±1, generated by a stationary 
Gaussian source with memory captured via the autocorrelation 
function 



c xx (m) = E [X k X k+m ] , m = 0, ±1, ±2, 



(1) 



The assumption of normal distribution for total load is a 
simplification from empirical observations [12] that the power 
consumption pattern of a typical appliance in the on state is 
approximately Gaussian. 

B. Utility and Privacy Metrics 

Since continuous amplitude sources cannot be transmitted 
losslessly over finite capacity links, a sampled sequence of 
n load measurements X n is compressed before transmission. 
In general, however, even if the sampled measurements were 
quantized a priori, i.e., take values in a discrete alphabet, there 
may be a need to perturb (distort) the data in some way to 
guarantee a measure of privacy. However, such a perturbation 
also needs to maintain a desired level of fidelity. 

Intuitively, utility of the perturbed data is high if any 
function computed on it yields results similar to those from 
the original data; thus, the utility is highest when there is 
no perturbation and goes to zero when the perturbed data is 
completely unrelated to the original. Accordingly, our utility 
metric is an appropriately chosen average 'distance' distortion 
function between the original and the perturbed data. 

Privacy, on the other hand, is maximized when the perturbed 
data is completely independent of the original. Our privacy 
metric measures the difficulty of inferring any private infor- 
mation of the data collector's choice, defined as a sequence 
{Y k } of random variables Y k e y, -oo < k < oo, which 
is correlated with and can be inferred from the revealed 
data. The random sequence {Y k } for all k along with the 
joint distribution px™Y n mathematically captures the space 
of all inferences that can be made from the measurements. 
We quantify the resulting privacy loss as a result of revealing 
perturbed data via the mutual information between the two 
data sequences. 



As an aside, we note here that our model of privacy is 
between a single user (household) and the electricity provider. 
It does not consider the leakage possibilities of comparing the 
perturbed data from two or more different users. On the other 
hand it can address the possibility of side-information such as 
income level of the user that may cause further information 
leakage. If we know the statistics of the side-information that 
we can incorporate the possible leakage into the model and 
derive the consequent modified privacy-utility tradeoff. For 
simplicity we ignore the side-information aspect in this paper. 

C. Perturbation: Encoding and Decoding 

Encoding: We assume that a meter collects n ^> 1 mea- 
surements in an interval of time prior to communication and 
that n is large enough to capture the source's memory. The 
encoding function is then a mapping of the resulting source 
sequence X n = (X\ X2 X3 . . . X n ), where X k € K, for all 
k = 1, 2, . . . , n, to an index W n € W n given by 



Fe : X r ' 



W n . = {1,2,...,M„} 



(2) 



where each index represents a quantized sequence. 

Decoding: The decoder (at the data collector) computes an 
output sequence X n = (Jti X 2 X 3 ... X n j , X k e K, for 
all k, using the decoding function 



Fd '■ W —> X r ' 



(3) 



The encoder is chosen such that the input and output sequences 
achieve a desired utility given by an average distortion con- 
straint 



D n = - £ E 

n k=\ 



X k — Xi 



(4) 



and a constraint on the information leakage about the desired 
sequence {Y k } from the revealed sequence {^fc j i s quantified 
via the leakage function 



/., -/ ( 

n 



(5) 



where E [•] denotes the expectation over the joint distribution 
of X n and X n given by p xx (x n ,x n ) = P Xn {x n ) p t {x n \x n ) 
where pt(x n \x n ) is a conditional pdf on x n given x n . The 
mean-square error (MSE) distortion function chosen in (0]l is 
typical for Gaussian distributed real-valued data as a measure 
of the fidelity of the perturbation (encoding). 

Note that D n and L n are functions of the number of mea- 
surements n and for stationary sources converge to limiting 
values [13]. Let D and L denote the corresponding limiting 
values for utility and privacy, respectively, i.e., 



D = lim D n and L = lim L r 



(6) 



D. Utility-Privacy Tradeoff Region 

Formally, the utility-privacy tradeoff region T is defined as 
follows. 

Definition 1: The smart meter utility -privacy tradeoff re- 
gion T is the set of all (D,L) pairs for which there ex- 
ists a coding scheme given by (O and (O with parameters 



(n, M n , D n + e, L n + e) satisfying (0]) and (f5]l for n sufficiently 
large and e > 0. 

Rate-Distortion-Leakage: The above utility-privacy tradeoff 
problem does not explicitly bound the number M n of encoded 
(quantized) sequences. An explicit constraint on 

M n < 2 n ^" +e > (7) 

results in a rate-distortion-leakage (RDL) tradeoff problem for 
which the feasible region is defined as follows. Let R = 
lim^oo (logM n ) /n. 

Definition 2: The rate-distortion-leakage tradeoff region 
IZrdl is the set of all (R, D, L) tuples for which there exists 
a coding scheme given by (fJJ, (O, and (O with parameters 
(n, M n , D n + e, L n + e) satisfying and (0 for n sufficiently 
large and e > 0. The function A (D) quantifies the minimal 
leakage achievable for a feasible distortion D such that the set 
of all (R, D, A(-D)) are the boundary points of TZrdl- 

Theorem 1: T = {(D, L) : (R, D, L) e TZ RDL ,D e [0, 

Ana*],£ > A (£>)}. 

Proof sketch: The crux of our argument is the fact that 
for any feasible utility vector D, choosing the minimum rate 
R(D, X(D)), ensures that the least amount of information is 
revealed about the source via the reconstructed variable. This 
in turn ensures that the minimal leakage X(D) of the correlated 
sequence Y n is achieved for that utility. For the same utility 
constraint, since such a rate requirement is not a part of the 
utility-privacy model, the resulting maximal privacy achieved 
is at most as large as that in TZrdl- 

E. Rate-Distortion-Leakage Tradeoff 

We now use Theorem [TJ to precisely quantify the utility- 
privacy tradeoff via the RDL tradeoff region. The proof is 
a direct generalization of the RDL region for memoryless 
sources (see, for example, [14], [15]), and hence, is omitted 
for lack of space. Intuitively, the proof follows from upper 
and lower bounding the minimal communication rate R as a 
function of D and L and the minimal leakage rate A as a 
function of D. 

Theorem 2: The rate-distortion-leakage region for a source 
with memory subject to distortion and leakage constraints in 
<j4j and (O is given by the rate-distortion and minimal leakage 
functions 

R(D,L) = lim inf -l(x n ;X n ) (8) 

n^roc p ( x rt ,y rt )p(x rl \x n ) n V / 

A(D)=lim inf -I (Y n ;X n ) . (9) 

n-s-oo p{x n ,y n )p(x n \x n ) n V / 

The utility-privacy tradeoff is captured by X(D) which is the 
minimal privacy leakage for a desired distortion (utility) D. 

Remark 1: The Markov relationship Y n — X n — X n is 
captured via the set of all distributions in dS) and (0 which 
minimize R(D, L) and \{D). 

Corollary 1: For Y k — X k , for all k, i.e., for the case 
in which the actual measurements need to be undisclosed, 
A(D) = R(D, L) = R{D) where R{D) is the rate-distortion 
function for the source. 



In general, the optimal distribution minimizing the rate 
subject to both the distortion and leakage constraints depends 
on the joint distribution of the measurement and inference 
sequences. Modeling this relationship is, in general, not 
straightforward or known a priori. Given this limitation, we 
consider a simple linear inference model given by 

Y k = a k X k + Z k , for all k, (10) 

where Z k ~ A/" (0,1) is independent of X k , and a k are 
constants. In this paper, we limit our results to these models 
to simplify our analysis and develop the intuition that can 
eventually lead us to develop complete solutions for a more 
general inference model. The following theorem captures our 
result. 

Theorem 3: The utility-privacy tradeoff for smart meter 
measurements modeled as a Gaussian source with memory 
with Y k = ctkXk + Z k , for all fc, is given by the leakage 
function X(D) which results from choosing the distribution 
p(x n \x n ) as the rate-distortion (without privacy) optimal 
distribution. 

Proof: The proof follows directly from noting that, for a 
given jointly Gaussian distribution of the source and correlated 
hidden sequence, px"Y™, the infimum in ([8]) and (0 is strictly 
over the space of conditional distributions of the revealed 
sequence given the original source sequence as a result of 
the Markov chain relationship Y n — X n — X n . Expanding 
the leakage as I(Y n ; X n ) = h(Y n ) - h(Y n \X n ), and using 
the fact for correlated Gaussian processes, Y k = a k Xk + Z k , 
for all k, where {Z k } is a sequence independent of {X k } 
and a k is a constant for each k, one can show that the jointly 
Gaussian distribution of X n and X n which minimizes (0 also 
minimizes ■ 
Remark 2: Theorem [3] simplifies the development of the 
RDL region for Gaussian sources with memory for which the 
rate-distortion function is known. For Gaussian sources with 
memory the rate-distortion function is known and lends itself 
to a straightforward practical implementation that we discuss 
in the following section. 

F. Rate-Distortion for Gaussian Sources with Memory 

In general, the rate distortion functions for sources with 
memory are not straightforward to compute. However, for 
Gaussian sources, the rate-distortion function R(D) (without 
the additional privacy constraint) is known and can be obtained 
via a transformation of the correlated source sequence X n to 
its eigen-space in which the transformed sequence X n is a 
collection of independent random variables with, in general, 
different variances. 

A standard approach to analyze correlated data is to project 
the data to an orthogonal basis in which the leakage and distor- 
tion constraints remain invariant. Since the data is random, we 
project on to the principal axes of the nxn correlation matrix 
Gxx whose entry in the i th row and j th column is c (i — j) 
defined in (fl~|l for which the mean-square error (Euclidean 
distance) function and the mutual information leakage are 
invariant. Thus, while the constraints for the original and 
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Fig. 1. The PSD of {X^}. The area below the curve and the horizontal line 
is equal to D. 



transformed measurements are the same, the advantage of the 
transformation is that the resulting measurements in any block 
of length n are statistical independent. 

We write Sx (/) denote the unitary transformation of the 
correlation matrix Gxx, i-e-, Sx (/) is the power spectral 
density (PSD) of the time series process {X (n)}, at discrete 
frequencies, / = 0, 1, 2, n — 1. We henceforth refer to the 
transform domain as the spectral domain in keeping with the 
literature. Similarly, let iSy(w) and Sxy(u) denote the PSDs 
of the {Yfe} and the {X^Yk} processes where Sxy(u) is the 
transform of the cross-correlation function cxy {m) of the 
two sequences. Let <fi denote the Lagrangian parameter for 
the distortion constraint (0]i in the rate minimization problem. 
Explicitly denoting the dependence on the water-level <fi, the 
rate-distortion function (D) and the average distortion 
function D (<fi) are given by [16] 

min (S x (uj),4>)— . (12) 

Note that the water-level <fi is determined by the desired 
average distortion D (</>) = D. Thus, R(D) for a Gaussian 
source with memory can be expressed as an infinite sum of the 
rate-distortion functions for independent Gaussian variables, 
one for each angular frequency cj G [— 7r,7r]. The "water- 
level" <f> captures the average time-domain distortion constraint 
across the spectrum such that the distortion for any 10 is the 
minimum of the water-level and the PSD. The privacy leakage 
\(D ((f))) is then the infinite sum of the information leakage 
about {Yfe} for each lu, and is given by 

A (D W) = r \ log ( . ) p. (13) 

J_ n 2 \SxY{u)g{u) +S Y (u)J 2vr 

where g (u) = (min (S x (u}),4>) — 1) • 

Remark 3: The transform domain "waterfilling" solution 
suggests that in practice the time-series data can be filtered 



for a desired level of fidelity (distortion) and privacy (leak- 
age) using Fourier transforms. The privacy-preserving rate- 
distortion optimal scheme thus reveals only those frequency 
components with power above the water-level (f>. Furthermore, 
at every frequency only the portion of the signal energy which 
is above the water level is preserved by the minimum-rate 
sequence from which the source can be generated with an 
average distortion D. 

IV. Illustration 

The following example illustrates our results. We assume 
that the private information to be hidden is the measurement 
sequence itself, i.e., Yfe = Xk, for all k. For the meter 
measurements modeled as a stationary Gaussian time series 
{Xk} , we choose Xk ~ 7V(0, 1) for all k £ X, and an 
autocorrelation function 

{1 m = 0, 
0.3 m = ±1, 
4 m = ±2 
otherwise. 

The power spectral density PSD (frequency domain represen- 
tation of the autocorrelation function) of this process is given 
by 

oo 

S(u>) = c m exp(iTOOj) = 1+0.6 cos(w)+0.8 cos(2w), 

m— — oo 

- 7T < LU < IT. (14) 

In order to obtain the rate-distortion function R c f,(D) for this 
source, for a given D we have to find the water-level (f> 
satisfying (fT2l . 

Figure Q] shows the PSD function c m . Determining (f> is 
equivalent to determining the height of the horizontal line, 
such that the area below the curve and the line equals D as 
given by (TT2l . Having determined (f>, R,p(D) is then given by 
(fTTT >. S(uj) takes its minimum value at ojq — arccos(^) ~ 
1.7594. Thus, for D < S(lo ) ^ 0.1437, 4> — D such that 
R(D) — -1- J"^ log (S(lo)/D) dio, which is the same as the 
rate-distortion function for a Gaussian source with variance 
a 2 = -^r J" \ogS(ui)du, i.e., when the distortion falls below 
a certain threshold, the rate required to reproduce the source 
at the receiver with the desired fidelity is the same as that of 
a memoryless Gaussian source. Finally, since we have chosen 
to hide the original meter measurements, for this problem, the 
privacy leakage is the same as the rate distortion. The resulting 
tradeoff between is shown in Fig. [2] 

V. Discussion and Concluding Remarks 

The theoretical framework that we have developed here 
allows us to precisely quantify the utility-privacy tradeoff 
problem in smart meter data. Given a series of smart meter 
measurements X, we reveal a perturbation X that allows us 
to guarantee a measure of both privacy in X and utility in X. 
The privacy guarantee comes from the bound on information 
leakage while the utility guarantee comes from the upper 
bound on the MSE distance between X and X. 
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Fig. 2. Plot of R^(D) = A^(D) vs. average distortion D. 



Our model of privacy, namely information leakage, does not 
depend on any assumptions about the inference mechanism 
(i.e. the data mining algorithms); instead it presents the least 
possible (on average) guarantee of information leakage about 
X, while the utility is preserved in an application-agnostic 
manner. Our framework is also agnostic about how the per- 
turbation is achieved; for example it can be achieved using a 
filter such as a battery or by adding noise. 

Modeling a smart meter as a Gaussian source with memory 
and extending known results from rate distortion theory, we 
show that a utility-privacy tradeoff framework can be con- 
structed that gives tight bounds on the amount of privacy that 
can be achieved for a given level of utility and vice-versa. 
The critical parameter of choice in the utility-privacy tradeoff 
is the water level <f>, which in turn depends on the bound 
on the distortion that is acceptable. The choice of dictates 
the extent to which the original signal (meter measurements) 
can be distorted and the rate R^, (D) is the maximum data 
precision allowed for which information leakage is at most 
X(D (</>)). In a practical context, the choice of is dictated by 
the choice of the privacy-utility tradeoff operating point, which 
in turn has to be negotiated between the energy provider and 
consumer. 

Our distortion model can be viewed as a filter on the load 
signal X - it filters out all frequencies that have power below 
a certain threshold (determined directly by </>). This filter is 
novel and comes directly as a result of our model. From 
a practical point of view, it makes sense in the following 
way. From the appliance signature chart in [1], frequency 
components that have low power typically correspond to 
fluctuations in energy consumption that are short-lived, which 
in turn are caused by appliances such as kettles and television 
sets and transmit the bulk of information about underlying 
human behavior. Frequency components that have high power 
tend to caused by continuously running appliances such as 
air conditioning units and refrigerators that reveal much less 
about human behavior. Suppressing low power components 



would thus reduce or eliminate the components of the signal 
that are likely to be most revealing about human behavior 
and thus match our intuition on privacy protection in smart 
meter data. At the same time, our utility constraints guarantee 
that most of the useful energy consumption information is 
retained in the revealed load data. This holds out hope that we 
can reveal significant energy consumption information while 
at the same time protecting significant personal information 
in a tunable tradeoff. This would be an interesting avenue 
for further exploration. Another interesting avenue to explore 
would be to apply and demonstrate the power of these concepts 
in a practical context. 
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